Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

نویسندگان

  • Xiaolei Li
  • Jiawei Han
چکیده

Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose timeseries data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, “more general” time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering in applications with multiple data sources - A mutual subspace clustering approach

In many applications, such as bioinformatics and cross-market customer relationship management, there are data from multiple sources jointly describing the same set of objects. An important data mining task is to find interesting groups of objects that form clusters in subspaces of the data sources jointly supported by those data sources. In this paper, we study a novel problem of mining mutual...

متن کامل

DensEst: Density Estimation for Data Mining in High Dimensional Spaces

Subspace clustering and frequent itemset mining via “stepby-step” algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large high dimensional data bases. Recent “jump” algorithms directly choose candidate subspace regions or patterns. Their scalability and quality depend heavily on the rating of these candidates as mislead jumps incur poor resul...

متن کامل

Fuzzy clustering of time series data: A particle swarm optimization approach

With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...

متن کامل

Optimal Centroid Estimation Scheme for Multi Dimensional Clustering

High dimensional data values are processed and optimized with feature selection process. A feature selection algorithm is constructed with the consideration of efficiency and effectiveness factors. The efficiency concerns the time required to find a subset of features. The effectiveness is related to the quality of the subset of features. 3 dimensional data models are constructed with object, a...

متن کامل

A Novel Noise Reduction Method Based on Subspace Division

This article presents a new subspace-based technique for reducing the noise of signals in time-series. In the proposed approach, the signal is initially represented as a data matrix. Then using Singular Value Decomposition (SVD), noisy data matrix is divided into signal subspace and noise subspace. In this subspace division, each derivative of the singular values with respect to rank order is u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007